1 Introduction

Recent anti-trust investigations of the internet search market in the US and Europe have considered to what extent search engines have the ability to influence traffic to websites. It is well known that the ranking (i.e. the hierarchical physical location on a search-results page) of websitesFootnote 1 is positively correlated with click-through rates (CTRs).Footnote 2 If this correlation reflected a causal impact of ranking on CTRs, then search engines with a large share of total search activity would influence a large amount of traffic to websites.

How could a correlation between rank and CTRs arise if there were no causal impact of the former on the latter? This might occur through reverse causation: The search engine might accurately predict the relevance of websites to users (and therefore their likely future CTRs) and then place websites on the page as a function of this prediction.

Using a unique dataset of individual search behavior we show that there is indeed a strong positive correlation between the rank of a website on a given search engine results page (SERP) and the probability that an individual will click on that website. Although part of this correlation can be explained by the predicted relevance of the website, there is a substantial direct causal impact even when this is taken into account.

We find that being at the top of the ranking in the algorithmic search results has a large and statistically significant causal impact on the odds of receiving a user click, and that moving the website from rank 1 to rank 2 on the same page decreases the odds of a click by between one third and two thirds depending on the specific search that is undertaken. We concede that no single statistical method completely eliminates endogeneity concerns; however, our results are robust and all evidence points to very high economic significance of the algorithmic rank.

There are no studies to our knowledge in the economic literature that estimate systematically the effect of rank in the algorithmic search results using individual user data. Athey and Meidan (2011) and Athey and Nekipelov (2011) are important papers on the analysis of user behavior in the paid results. Previous research has indicated that CTRs can increase markedly for results placed at the top of a results page compared to other ranks and pages (Smith and Brynjolfsson 2001; Xu and Kim 2008; Ghose and Yang 2009. Of these studies, Xu and Kim (2008) is based on a small-sample laboratory experiment, and Xu and Kim (2008) uses data on paid search from an advertiser. The closest study in spirit to ours is Smith and Brynjolfsson (2001), which uses individual data from an Internet book-purchase site, but this relates to purchases of homogeneous books rather than searches for heterogeneous websites.

Armstrong et al. (2009) and Armstrong and Zhou (2011) study the welfare implications of “prominence” in search markets. Among studies investigating the impact of placement on sponsored search results, Jerath et al. (2011) demonstrate the existence of a “position paradox” where advertisements at higher positions obtain more clicks, but this effect can be offset by a superior firm reputation. The paradox is that the superior firm may make higher profits from bidding for lower ranked positions. In a similar vein, Baye et al. (2012), using search data that were aggregated by retailer, consider the impact of product position and product reputation on the organic search results page on CTRs and find that both are important factors.

We proceed as follows. In Sect. 2 we describe the data selection process and provide summary statistics for our dataset. We also describe the nature of the search engine algorithm and provide descriptive evidence about the determinants of ranking. In Sect. 3 we demonstrate econometrically the effect of ranking on the probability of clicking on a website. Section 4 investigates the contribution of reputation and conspicuousness to enabling page rank to influence click probabilities. Section 5 concludes.

2 Data

2.1 Query Term Selection

Microsoft Corporation provided us with access to the database of the Bing log files for individual user searches during November and December 2010 and January 2011. All search engines store data from user sessions in detailed logs. The Bing logs contain recorded observations for each of the millions of Bing user queries, including for each query: a record of the date and time; all websites that were displayed on the SERP generated from the search; each website’s position on the SERPs; and which websites were clicked. For each website that appeared in a set of search results, we know at what rank it appeared in each view and whether it was clicked on during that view.

In order to isolate the impact of website relevance from that of page rank, we need query terms where the website relevance to the user query remains reasonably constant during the time period of study, while the ranking of websites varies (even if only slightly). We also need to eliminate as far as possible other confusing influences. To find suitable data, we first categorized a list of available query terms and then eliminated the non-suitable categories until we arrived at a final list of queries.

A first type of unsuitable query is one that generates what are known are “highly monetized” results. For example, the query term “airline tickets” signals the intent to shop for airline tickets on-line and, because it is defined in generic terms, occurs with relatively high frequency. The intent to make a purchase and the high frequency make this query attractive to the advertisers and the results page is highly monetized: There is a large volume of ads. The ads distract from the algorithmic results and introduce more “noise” into the algorithmic click behavior data. In order to predict click behavior on the algorithmic results we would need to know all of the paid results as well (whose presence might well be endogenous). As a consequence, these queries are not suitable for our analysis.

A second type of unsuitable query is what is known as “superfresh”. Consider the query term “Obama approval rating”. The intent is to look for current news, and every day (sometimes every hour) a different set of websites will be most relevant and appear in the top ranks. This variability in website relevance, which we cannot directly observe and for which we cannot control, makes such query terms unsuitable for our analysis.

A third type of unsuitable query is “navigational”: Where the user has a prior intent to navigate to a specific website. An example of this is one of the most frequent queries—“facebook”—and the search results display the different subpages of this website. Although a large proportion of query terms have some corresponding domain name and thus could in theory be navigational, queries become unsuitable for our purposes only when such query terms regularly appear in among the top results on the page.

Finally, query terms that arise from non-uniform intent across users are also unsuitable. One example is the query “eclipse”. Based on the websites that are displayed on the results page, this search has at least three possible intents: to learn about a solar or lunar eclipse, to find information about a software product that is known as Eclipse, and to search for one of the Twilight Saga books with this title (which is a teenage vampire romance novel).

Thus, we manually sorted through an extensive list of queries, and found four query terms that were suitable for our purposes. In alphabetical order these are: “Free Movies”, “Fun Games”, “Phone Numbers” and “Sports”.Footnote 3 Although some of these query terms are now monetized, none were so at the time of our study. None related to newsworthy events that might have had an impact on relevance. None were primarily navigational, and none showed significant evidence of non-uniform intent.Footnote 4

2.2 Algorithm

Algorithms are sometimes patented (the Google PageRank algorithm is covered by U.S. Patent No. 6,285,999) and exact formulas are held as trade secrets. However, the general characteristics of search algorithms are known. The paper that introduced Google Brin and Page (1998) states that “Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.” The fundamental ranking techniques of a search engine algorithm depend on natural language processing of the content of websites, topological analysis of the connections between websites, and analysis of the interactions of consumers with search results, among other things.

A Search Engine algorithm proceeds in two steps: choosing the websites that match the query term and then putting them in ranking order. The first step uses keyword focused measures, which examine the placement and count of the query term words in a website name and anchor text.Footnote 5 Once the set of websites to be displayed in the SERP is determined, they are ranked using natural language techniques, static rankFootnote 6 and user behavior data, such as prior website traffic and prior CTR.

This obviously raises a concern about reverse causality: It may be previous CTRs that determine ranking rather than ranking that determines future CTRs. Based on discussions with the engineers who provided us with the Bing data, we believe that at the time of our study (11/1/2010–1/31/2011), and for our selected query terms, the Bing algorithm relied on website CTRs that were calculated over long prior periods of time, and was refreshed only occasionally. As we illustrate further below, fluctuations in the CTR over short periods of time do not seem to be a determinant in Bing ranking for the query terms that we selected.

During the study period, some instability remained in the relatively new Bing algorithm, which can cause variation in ranks and is most probably the cause of the variation in page rank in our data.Footnote 7 In addition, during this study period, the results of the Bing algorithm were not personalized to user characteristics, which further alleviates many potential data concerns.

2.3 Sample Statistics

Our sample consists of those websites that appear on Bing on the first SERP (in positions 1–10) for each of the four query terms considered. “Free Movies” resulted in views for 262 such distinct websites, “Fun Games” for 158, “Phone Numbers” for 322, and “Sports” for 996.

However, not all websites had views in all ten positions. As an illustration, Table 1 displays the top five websites (as determined by the total number of views for the time period of our analysis) for the query term “Phone Numbers”; they are displayed in the order of frequency of appearance in Rank 1.

Table 1 Top five websites for “Phone Numbers”

For each of the five websites, Table 1 shows how many views each website had in each rank during our sample period, and what the website CTRs were in each rank. For example, website phonenumbers.com had 17,075 views in rank 1, and 29.5  % of the views resulted in a click-through (CTR is 0.295). The statistics for each query term show that being in the top rank is associated with higher CTRs for each domain.

In addition, the frequency with which the top three websites appear in the top rank is also often, though not always, reflected in the ordering of their CTRs when they appear in the second rank, suggesting that some of the ranking frequency may reflect perceived website relevance. In particular, two websites—phonenumber.com and whitepages.com—are competing for the top spot on the page. Phonenumber.com has 17,075 views in rank 1 (with top rank CTR of 0.295) and whitepage.com has 14,652 (CTR is 0.274): When one website is in rank 1, the other website is usually displayed in rank 2. Phonenumber.com is slightly more relevant to the user query, since it is being clicked on more often in nearly every rank compared to whitepages.com.Footnote 8 This is consistent with the observation that phonenumber.com is observed in rank 1 more often.

Tables 23 and 4 present the same statistics for the other three query terms, and display broadly similar characteristics.

Table 2 Top five websites for “Free Movies”
Table 3 Top five websites for “Fun Games”
Table 4 Top five websites for “Sports”

These data naturally raise the question of what triggers changes in ranking. In particular, we are interested in whether the data are consistent with our claim that changes in ranking are more likely to reflect random events than to have been triggered by prior changes in CTRs. To examine this further, Fig. 1 has the time series of the daily CTR (dotted line) and daily percent of views in Rank 1 (solid line) for the two leading websites for the “Phone Numbers” query.

Fig. 1
figure 1

CTR and % of views in Rank 1 daily. Query “Phone Numbers”, a phonenumber.com, b whitepages.com

Our main concern is whether the changes in CTR trigger the switch between the ranks for these websites. This does not appear to be the case. It is easy to observe the level change in CTR once a website is displayed in Rank 1 more often, and the changes in CTR appear to occur after—rather than to precede—the switch between the ranks.

However, visual inspection of Fig. 1 is not the way to settle the question. Our conjecture is confirmed by Granger causality tests that were run for both websites. The summary statistics of the sample used for the Granger causality tests can be found in Table 5, and the results of the Granger causality tests are reported in Table 6. Note that the proportion of views in which the domain appears in rank 1 may sum to greater than unity across websites: For instance, it would be possible for two domains each to appear in rank 1 whenever they are viewed (thus 100 % of the time), provided the domains are never viewed on the same page

Table 5 Summary statistics for domains used in granger causality tests
Table 6 Granger causality: daily CTR and % in rank 1

To determine the direction of causality between daily percentage of views in which the website appears in Rank 1 and its daily CTR, we perform a Wald test for the null hypothesis that lagged values of the former can be excluded from a regression of the latter, and vice versa. For the “Phone Numbers” query, we can clearly reject the null hypothesis that prior page rank has no effect on current CTRs: The F-statistic for the exclusion of the percentage of time spent in Rank 1 from the equation for CTR is significant at 1 % for one domain and 0.1 % for the other. On the other hand, we fail to reject the null hypothesis that prior CTR has no effect on current page rank.

For the other queries the evidence is more mixed. For “Sports” the results are similar to “Phone Numbers” but at slightly lower levels of significance (5 %). For “Fun Games” there is no evidence of Granger-causality in either direction, while for “Free Movies” there is evidence of two-way causality for one domain and none for the others.

Overall, for two query terms we can clearly accept the hypothesis, suggested to us by Bing engineers, that prior CTR is not used to determine the rank of the website. For the other query terms there is evidence of possible influence of CTR on page rank for only one of the domains used. On balance the hypothesis of lack of reverse causality seems broadly plausible given the evidence available to us.

3 Econometric Estimation

In order to estimate the effect of page rank on click probabilities we use the multinomial logit model that was developed by McFadden and used for a large variety of situations in which users make a single choice from a range of discrete options. This means that instead of estimating determinants of CTRs over a given time period we estimate the odds that a website in a given page rank is clicked on, relative to a website in the baseline Rank 10, averaged across all SERPs that gave rise to a user click.Footnote 9 This therefore allows us to abstract from the many factors that can affect CTRs, such as time of day, since these factors do not vary between alternatives that are presented to the user in a given page view.

The results are presented in Tables 789 and 10 for the four query terms. For ease of interpretation the coefficients are presented as odds ratios, so that the effect of a given rank should be understood as the odds that the user clicks on a website in that rank divided by the odds of clicking on a website in rank 10. An odds ratio of 1 would therefore imply no effect: the rank in question was no more likely to be clicked on than is rank 10. Odds ratios less than one imply a negative effect, odds ratios greater than one imply a positive effect.

Table 7 Page rank and domain reputation as determinants of click odds using rank only
Table 8 Page rank and domain reputation as determinants of click odds using rank and mean rank
Table 9 Page rank and domain reputation as determinants of click odds using rank, mean rank and brand
Table 10 Page rank and domain reputation as determinants of click odds using rank, mean rank and domain fixed effects

There is a large variation among the query terms in the magnitude of the rank effects, but the broad qualitative findings are remarkably similar. Table 7 gives the effect of rank without controlling for website relevance for each of the four query terms. We can see that being in rank 1 increases the odds of being clicked on, relative to rank 10, by between 11 times (for “Free Movies”) and 220 times (for “Phone Numbers”). This is roughly twice as large as the effect of being in rank 2, though the exact proportion varies somewhat between query terms.

There are two ways in which we control for website relevance. The first, as reported in Table 8 for each of the four query terms, is to control for the mean rank of a website over the whole sample period. This is based on the idea that the mean rank of the website does reflect the search engine’s estimate of its likely relevance to users, while deviations within the sample period from this mean rank do not reflect variations in likely relevance.

Our “Mean Rank” variable is the inverse of the arithmetic mean of the rank number, so that higher values of the variable reflect higher ranks (ie those closer to rank 1). Controlling for Mean Rank lowers the odds ratio for rank 1 by over half for all queries except “Free movies”, where it has a small lowering effect.

Our second way of controlling for website relevance, as reported in Table 9 for each of the four query terms, is to use a dummy variable that we call “Brand” for any website that appears in rank 1 during the sample period more than 0.5 per cent of the total number of SERP observations.Footnote 10 This definition captures the idea that such websites are likely to be perceived as more relevant. Adding this variable to the specification that includes Mean Rank reduces further to a small extent the odds ratio for rank 1, except for “Phone Numbers” where it increases the ratio slightly, probably due to collinearity with Mean Rank.

As a robustness check we use separate fixed effects for each of the “Brand” websites instead of a single dummy variable, as reported in Table 10 for each of the four query terms. This lowers substantially the coefficient on Mean Rank, turning it negative in three cases out of four, without substantially altering the coefficients on Rank. This appears to indicate that the fixed effects and the Mean Rank variable are substantially collinear.

Overall, it is striking that even after these controls for relevance there is a large, statistically and economically very significant effect of being in rank 1 as compared to rank 10. Even in the most conservative specification (number 3), the odds ratios vary from around 9 (for “Free Movies”) to over 120 (for “Phone Numbers”), and this effect is at least 50 % higher and sometimes more than twice as high as the effect of being in rank 2. The effects also decline as rank declines, roughly but not strictly monotonically.

4 Forces Behind the Impact of Rank

If page rank exerts a strong causal influence on the likelihood that users click on a website, what is the reason for that effect? In particular, to what extent is it due to the fact that higher ranked websites are more conspicuous on the page, and to what extent is it due to the reputation of the search engine for delivering relevant results in the higher ranks?

To explore this question we make use of a simple insight: The reputation of the search engine for relevance will be a substitute for any reputation for relevance that the website may have in its own right. Websites with strong positive reputations will require less assistance from the reputation of the search engine.

We can therefore compare two alternative models of the process by which users search: In the first, reputation-based model, users compare all the domains that appear on a page and decide which is most likely to meet their needs, based on combining information based on the page rank (given the reputation of the search engine for reliability) with information based on the domain’s own reputation. In this case we expect to see that the higher the reputation of the domain in its own right, the less additional benefit it will gain from being in a high rank.

In the second, conspicuousness-based model, users begin at the most conspicuous point on the results page (typically though not necessarily the first rank), and decide whether to click or to continue to the next result. In this setting the reasons why a user will click immediately may be situational (such as that the user is in a hurry) or based on recognition of the domain as one with a good reputation for relevance. In this model of sequential choice, the websites with high own reputations will benefit more rather than less from being in a high rank. They have more to gain from being brought to the user’s attention since they are more likely to hold such attention and convert it into a decision to click.

This suggests looking for interaction relationships between our rank variable and our separate measures of website relevance: Mean Rank and Brand. If the positive impact of being in a high rank is due principally to reputation, we should observe a smaller additional effect of reputation (as measured by our relevance indicators) for websites that appear in the higher ranks. Conversely, if it is due principally to conspicuousness, we should observe a larger additional effect of reputation (as measured by our relevance indicators) for websites that appear in higher ranks.

Tables 1112 and 13 explore this question by interacting our relevance measures with page rank for each of the four query terms. For both Mean Rank and Brand, we include an interaction term for the variable for the first two ranks only.Footnote 11 If the coefficient on this interaction variable is greater than one, relevance is more important for websites in higher ranks; if it is less than one, relevance is less important in higher ranks.

Table 11 Interaction of page rank and reputation using mean rank in top ranks
Table 12 Interaction of page rank and reputation using brand in top ranks
Table 13 Interaction of page rank and reputation using both mean rank in top ranks and brand in top ranks

The results tend to indicate a lower effect of Mean Rank in the top two ranks than in the remaining ranks, but the evidence is not unequivocal. For two out of four query terms the interaction term is strongly and significantly \(<\)1, while for one other query it is insignificantly less than one and for the other it is insignificantly greater than one. For Brand three of the interaction terms are less than one but only one is significantly so, while the other is significantly \(>\)1. When both sets of regressors are included together, no clear pattern emerges, though the coefficients indicate the likelihood of significant collinearity.

On balance, the evidence is suggestive rather than conclusive. Nevertheless, it suggests that reputation is a stronger force than conspicuousness in explaining the causal impact of page rank on click probabilities, but that conspicuousness has a role to play as well.

5 Conclusion

We have shown in this paper that when a website appears in a high rank on a Search Engine Results Page it has a substantial and highly significant positive causal effect on the probability that a user will click on the website. We have done so using a unique data set that allows us to abstract from the fact that search engines determine rank partly by predicting the likely relevance of websites to user needs.

We have shown that this estimation is robust to possible concerns about the endogeneity of page ranking. We have further provided evidence that suggests that rank influences CTRs somewhat more by substituting the reputational capital of the search engine for the reputation of individual websites. However, there is also some evidence that conspicuousness plays a role as well, which implies that one of the assets that search engines deploy is access to the scarce attention of users.